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Abstract 

An information-theoretic approach is proposed to watermark embedding and detection under 
limited detector resources. First, we consider the attack-free scenario under which asymptotically op- 
timal decision regions in the Neyman-Pearson sense are proposed, along with the optimal embedding 
rule. Later, we explore the case of zero-mean i.i.d. Gaussian covertext distribution with unknown 
variance under the attack-free scenario. For this case, we propose a lower bound on the exponential 
decay rate of the false-negative probability and prove that the optimal embedding and detecting 
strategy is superior to the customary linear, additive embedding strategy in the exponential sense. 
Finally, these results are extended to the case of memoryless attacks and general worst case attacks. 
Optimal decision regions and embedding rules are offered, and the worst attack channel is identified. 

1 Introduction 

The field of information embedding and watermarking has become a very active field of research in the 
last decade, both in the academic community and in the industry, due to the need of protecting the vast 
amount of digital information available over the Internet and other data storage media and devices (see, 
e -g->0~| _ [I])- Watermarking (WM) is a form of embedding information secretly in a host data set (e.g., 
image, audio signal, video, etc.). In this work, we raise and examine certain fundamental questions with 
regard to customary methods of embedding and detection and suggest some new ideas for the most basic 
setup. 

Consider the system depicted in Fig. 1: Let x = (x\, ...,x n ) denote a covertext sequence emitted 
from a memoryless source Px, and let u = (ui, . . . ,U n ) denote a watermark sequence available at the 
embedder and at the detector. Our work focuses on finding the optimal embedding and detection rules for 
the following binary hypothesis problem: under hypothesis H±, the stegotext sequence y = (yi, ...,?/„) 
is "watermarked" using the embedder y = f n (x,u), while under Ho, y = x, i.e, the stegotext sequence 
in not "watermarked". An attack channel W n (z\y), fed by the stegotext, produces a forgery z, which in 
turn, is observed by the detector. Now, given the forgery sequence z and the watermark sequence u, the 
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detector needs to decide whether the forgery is "watermarked" or not. Performance is evaluated under the 
Neyman-Pearson criterion, namely, minimum false detection probability while the false alarm probability 
is kept lower than a prescribed level. The problem is addressed under different statistical assumptions: 
the covertext distribution is known or unknown to the embedder/detector, the attack channel is known 
to be a memoryless attack or it is a general attack channel, and the watermark sequence is deterministic 
or random. 
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Figure 1: The watermarking and detection problem. 



Surprisingly, this problem did not receive much attention in the information theory community. In 
[5] , the problem of universal detection of messages via finite state channel was considered, and an optimal 
decision rule was proposed for deciding whether the observed sequence is the product of an unknown finite- 
state channel fed by one of two predefined sequences. Liu and Moulin [B],[7] explored the error exponent 
of two popular one-bit WM systems: the spread-spectrum scheme and the quantized-index-modulation 
(QIM) watermarking scheme, under a general additive attack. Bounds and closed form expressions were 
offered for the error exponents. We note that the setting of [B] is different from ours: here, we are trying 
to find the best embedder given detection resource under Neyman-Pearson criterion of optimality, while 
in [H], the performance (the error exponent) of a given embedding schemes and a given source distribution 
are evaluated under additive attacks. In [8|, the problem of embedding/detection was formulated under 
limited detection resources and the optimal decision region and the optimal embedding rule were offered 
to the attack-free scenario. 

Many researchers from the signal/image processing community (e.g., [2], [3], [5]- [13] , [HI Sec. 4. 2] and 
references therein) have devoted research efforts to explore the problem of optimal watermark embedding 
and detection with one common assumption: the watermark embedding rule is normally taken to be 
additive (linear), i.e., the stegotext vector y is given by 

y = x + ju (1) 
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or multiplicative, where each component of y is given by 

yi = Xi(l + jUi), i = l,...,n, (2) 

where in both cases, u, = ±1, and the choice of 7 controls the tradeoff between quality of the stego-signal 
(in terms of the distortion relative to the covertext signal x) and the detectability of the watermark - the 
"signal-to-noise" ratio. 

Once the linear embedder (fTJ) is adopted, elementary detection theory tells us that the optimal 
likelihood-ratio detector under the attack free scenario (i.e., z = y), assuming a zero-mean, Gaus- 
sian, i.i.d. covertext distribution, is a correlation detector, which decides positively [H\: y = x + 711) 
if the correlation, X)"=i u iDii exceeds a certain threshold, and negatively (Hq: y — x) otherwise. The 
reason is that in this case, x simply plays the role of additive noise (the additive embedding scheme is, 
in fact, the spread-spectrum modulation technique [15j in which the covertext is treated as an additive 
noise). In a similar manner, the optimal test for the multiplicative embedder ^ is based on the different 
variances of the t/j's corresponding to itj = +1 relative to those corresponding to Ui — — 1, the former 
being a%{l + j) 2 , and the latter being a 2 (l — j) 2 , where a 2 is the variance of each component of x. 

While in classical detection theory, the additivity (Q]) , (or somewhat less commonly, the multiplicativity 
©) of the noise is part of the channel model, and hence cannot be controlled, this is not quite the case in 
watermark embedding, where one has, at least in principle, the freedom to design an arbitrary embedding 
function y = f n (x,u), trading off the quality of y and the detectability of u. Clearly, for an arbitrary 
choice of /„, the above described detectors are no longer optimal in general. 

Malvar and Florencio [TB] have noticed that better performance can be gained if 7 is chosen as 
a function of the watermark and the covertext. However, their choice does not lead to the optimal 
performance as will be shown later. Recently, Furon |17j explored the zero-bit watermark problem using 
a different setting in which the watermark sequence is a function of the covertext and under a different 
criterion of optimality. 

While many papers in the literature addressed the problem of computing the performance of different 
embedding and detection strategies and plotting their receiver operating characteristics (ROC) for differ- 
ent values of the problem dimension n (see, e.g.,[TT],[T5],[T5] and references therein), very few works [B],[7] 
deal with the optimal asymptotic behavior of the two kinds of error probabilities, i.e., the exponential 
decay rate of the two kind of the error probabilities as n tends to infinity. 

The problem of finding the optimum watermark embedder f n for reliable WM detection is not trivial: 
The probabilities of errors of the two kinds (false positive and false negative) corresponding to the 
likelihood-ratio detector induced by a given f ni are, in general, hard to compute, and a— fortiori hard 
to optimize in closed form. Moreover, obtaining closed form expressions for the optimal embedder and 
decision regions when the covertext distribution is unknown is even harder (see Section 2 for more details). 

Thus, instead of striving to seek the strictly optimum embedder, we take the following approach: 
Suppose that one would like to limit the complexity of the detector by confining its decision to depend 
on a given set of statistics computed from z and u. For example, the energy of z, X)T=i z i > anc ^ the 
correlation X)"=i u i z i> which are the sufficient statistics used by the above described correlation detector. 
Other possible statistics are those corresponding to the likelihood-ratio detector of @, namely, the 
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energies u -=+i z » 2 ' anc ^ Si- « -=-i an d so on - Within the class of detectors based on a given set 
of statistics, we present the optimal (in the Neyman-Pearson sense) embedder and its corresponding 
detector for different settings of the problem. 

First, we formulate the embedding and detection problem under the attack free scenario. We devise 
an asymptotically optimal detector and embedding rule among all detectors which base their decisions 
on the empirical joint distribution of z and u. This modeling assumption, where the detector has access 
to a limited set of empirical statistics of u and z, has two motivations. First, it enables a fair comparison 
(in terms of detection computational resources) to different embedding/detection methods reported in 
the literature of WM in which most of the detectors use a similar set of statistics (mostly, correlation 
and energy) to base their decisions. Second, this approach highlights the tradeoff between detection 
complexity and performance: Extending the set of statistics on which the detector can base its decisions, 
might improve the system performance, however, it increases the detector's complexity. 

Later, we discuss different aspects of the basic problem, namely, practical issues regarding the imple- 
mentability of the embedder, universality w.r.t. the covertext distribution, other detector's statistics, and 
the case where the watermark sequence is random too. These results are obtained by extending the tech- 
niques, presented in [S],[T5]-[3T], which are closely related to universal hypothesis testing problems. We 
apply these results to a zero-mean i.i.d. Gaussian covertext distribution with unknown variance. We pro- 
pose a closed-form expression for the optimal embedder, and suggest a lower bound on the false-negative 
probability error exponent. By analyzing the error exponent of the additive embedder and using the sug- 
gested lower bound, we show that the optimal embedder is superior to the customary additive embedder 
in the exponential sense. Finally, we extend these results to memoryless attack channels and worst-case 
general attack channels. The worst-attack channel is identified and optimal embedding and detection 
rules are offered. The model of general worst-case attack channels, treated here, was already considered 
in the WM literature but in a different context. In [22], general attack channels were considered, where 
the capacity and random-coding error exponent where derived for the private watermarking game under 
general attack channels. In |23j . the capacity of public watermark game under general attack channels 
was derived for constant composition codes. This paper is a further development and an extension of [5], 
[21] and it gives a detailed account for the results of [25j . 

2 Basic Derivation 

We begin with some notation and definitions. Throughout this work, capital letters represent scalar 
random variables (RVs) and specific realizations of them are denoted by the corresponding lowercase 
letters. Random vectors of dimension n will be denoted by bold-face letters. The notation 1{^4}, where 
A is an event, will designate the indicator function of A (i.e.,l{A} = 1 if A occurs and 1{A} = 
otherwise). We adopt the following conventions: The minimum (maximum) of a function over an empty 
set is understood to be oo (— oo). The notation a n = b n , for two positive sequences {a n }n>i and {b n } n >\, 
expresses asymptotic equality in the logarithmic scale, i.e., 

Um I in ( = o. 
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Let the vector P x = {Px(a), a e X} denotes the empirical distribution induced by a vector x e X n , 
where Px{a) = ^SiLi ^-{ x i = a }- The type class T(x) is the set of vectors x S X n such that Pj, = Px- 
Similarly, the joint empirical distribution induced by (cc,y) e X n x y n is the vector: 

Pxy = [Pxy{a, b), aeX, bey} , (3) 

where 

1 ™ 

Ary(a, 6) = - ^ l{ Xi = a,y t = b}, xeX, yey , (4) 

i.e., Pxy(a, b) is the relative frequency of the pair (a, b) along the pair sequence (x, y). Likewise, the type 
class T(x,y) is the set of all pairs (x,y) <G X n x y n such that P X y = Pxy- The conditional type class 
T(y\x), for given vectors x e X n , and y e y n is the set of all vectors y G J 7 ™ such that T(cc, y) = T(x, y). 
We denote by E X y(-) expectation with respect to empirical joint distribution Pxy- The Kullback-Leibler 
divergence between two distributions P and Q on A, where |„4| < oo is defined as 

P(a) 

r yu ) in 

aeA 



with the conventions that OlnO = 0, and pin ^ = oo if p > 0. We denote the empirical entropy of a 
vector x e X n by H X (X), where 



Hx(X) = - J2Px(a) In P x (a) . 

aex 



Other information theoretic quantities governed by empirical distributions (e.g., conditional empirical 
entropy, empirical mutual information) will be denoted similarly. 

For two vectors, a,b € R n , the Euclidean inner product is defined as (a, b) = X)"=i a * ' ^ anc ^ the 



I/2-norm of a vector is defined as |ja|j = y 7 (a, a). Let Vol{A} denote the volume of a set A C H", i.e., 
Vol{A} = J A dx. We denote by sgn(-) the signum function, where sgn(x) = l{x > 0} — \{x < 0}. 

Throughout this paper, and without essential loss of generality, we assume that the components of x, 
y, and z all take on values in the same finite alphabet A. In Section 4, the assumption that A is finite 
will be dropped, and A will be allowed to be an infinite set, like the real line. The components of the 
watermark u will always take on values in B = {— 1, +1}, as mentioned earlier. Let us further assume 
that x is drawn from a given memoryless source Px ■ 

Throughout the sequel, until Section 5 (exclusively), we assume that there is no attack, i.e., the 
channel W n (z\y) is the identity channel: 



W n (z\y) = 



i , * = y 

, else 



This is referred to as the attack-free scenario. In this scenario, the detector will use y and u to base its 
decisions. 

For a given u £ B n , we would like to devise a decision rule that partitions the space A n of sequences 
{y}, observed by the detector, into two complementary regions, A and A c , such that for y g A, we decide 
in favor of Hi (watermark u is present) and for y e A c , we decide in favor of H (watermark absent: 
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y = x). Consider the Neyman-Pearson criterion of minimizing the false negative probability 

Pfn= E P X( X ) ( 5 ) 

x. f n (x,u)eA c 

subject to the following constraints: 

(1) Given a certain distortion measure d e (-, ■) and distortion level D e , the distortion between x and y, 
d e (x,y) = d e (x,f n (x,u)), does not exceed nD e . 

(2) The false positive probability is upper bounded by 

p f p = J2 p x(v^ e ~ Xn > ( 6 ) 

yeA 

where A > is a prescribed constant. 

In other words, we would like to choose f n and A so as to minimize Pf n subject to a distortion constraint 
and the constraint that the exponential decay rate of Pf p would be at least as large as A. 

Clearly, the problem is a classical hypothesis problem (under the Neyman-Pearson criterion of opti- 
mality), with the following hypotheses: H$ : y = x (the covertext is not "marked") and Hi : y = f n (x, u) 
(the covertext is "marked"). Given /„ and u, we can define the conditional distribution of y given the 
two hypotheses: 

p(v\h ) = Px(y) , 

x-.f, l (x,u)=y 

where Px (x) is the covertext distribution. The optimal test which minimizes the false-negative probability 
under the Neyman-Pearson criterion of optimality is the likelihood ratio test (LRT) [26), p. 34] : 

P(y|gi) > 

L{y) = w^y < 11 

Ho 

where 77 is chosen such that 

Pfp(fn,u)= E Px(y) = e- nX . (7) 

y-L(y)>v 

Note that 77 is a function of A, f n and u, therefore, we could not find a closed-form expression for 77 for 
any general embedding rule and watermark sequence. The false-negative probability associated with the 
above optimal test is given by 

Pfn(f n ,X,u)= J2 E P X(*)- ( 8 ) 

y:L(y)< n x-.f n (x,u)=y 

Now, given a distortion level D e measured using a distortion function d e (-, •), we would like to devise an 
embedder /„ which minimizes the false-negative probability while the distortion between the covertext x 
and the stegotext y does not exceed nD e and the false-positive probability is kept lower than e _nA , i.e., 

/n = arg min PfnifnA) ■ (9) 

/„ : d e (x, f n (x,u)) < nD e ,yx 

Pf P (fn,u) <er nX 
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The above general problem of finding the optimal embedding rule and detection regions is by no means 
trivial. The fact that the probabilities of the two kinds of error cannot be expressed in a close form make 
it very hard to solve this optimization problem and, as far as we know, there is no known solution for it. 
Moreover, obtaining closed form expressions for the optimal embedder and decision regions when P x is 
unknown is even harder. 

We therefore make an additional assumption regarding the statistics employed by the detector. Sup- 
pose that we limit ourselves to the class of all detectors which base their decisions on certain empirical 
statistics associated with u and y, for example, the empirical joint distribution of y and w, i.e., Puy- 
Note that the requirement that the decision of the detector depends solely on Puy means that A and A c 
are unions of conditional type classes of y given u. 

It may seem, at a first glance, that the sequence u is superfluous in the definition of the problem, 
since it is available to all legitimate parities. However, the presence of the watermark sequence u at 
the detector provides the detector with a refined version of the statistics of its input (based on the joint 
empirical statistics of y and u) and can be regarded as a secret key shared by both legitimate sides. This 
additional information at the detector improves the overall performance of the system. 

For a given A > 0, define 

K = {y- \nP x (y)+nHuy(Y\U) + \n-\A\\n(n+l) <o}. (10) 

The following theorem asserts that A* is asymptotically optimal decision region: 

Theorem 1. (i) P/ P (A») < e -«( A - 5 «) where lim^^ 5 n = 0. 

(ii) For every A C A n that satisfies Pf p (A) < e~ nX for some A' > A, we have A^ C A c for all sufficiently 
large n. 

In the above theorem it is argued that A* fulfills the false-positive constraint while minimizes the 
false-negative probability, i.e., for any decision region A which fulfills the false-positive constraint and for 
any embedding rule f n (x,u) the following holds 

Pfn(K) < Pfn(A C ) ■ (11) 

Proof. Let T(y\u) C A. Then, we have 

e -Xn > £p x (y') 

yeA 

> E P *U) 

yeT(y\u) 

> \T{y\u)\-P x (y) 

> (n+l)-^e n " u y {YlU) -P x (y) , (12) 

where the first inequality is by the assumed false positive constraint, the second inequality is since 
T(y\u) C A, and the third inequality is due to the fact that all sequences within T(y\u) are equiprobablc 
under P x as they all have the same empirical distribution, which forms the sufficient statistics for the 
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memoryless source Px- In the fourth inequality, we use the well known lower bound on the cardinality 
of a conditional type class in terms of the empirical conditional entropy [27j . defined as: 



H U y(Y\U) = Puy{u,y)lnP U y(y\u) 



(13) 



where Puy(y\u) is the empirical conditional probability of Y given U . We have actually shown that every 
T(y\u) in A is also in A*, in other words, if A satisfies the false positive constraint ^j, it must be a subset 
of A*. This means that A^: C A c and so the probability of A^: is smaller than the probability of A c , i.e., 
A£ minimizes Pf n among all A c corresponding to detectors that satisfy ([5]). To establish the asymptotic 
optimality of A*, it remains to show that A* itself has a false positive exponent at least A, which is very 
easy to show using the techniques of eq. (6)] and references therein. Therefore, we will not include 
the proof of this fact here. Finally, note also that A* bases its decision solely on Puy, as required. □ 

While this solves the problem of the optimal detector for a given f ni we still have to specify the 
optimal embedder /*. Defining T%(f n ) to be the inverse image of AJ given it, i.e., 

r$(/„) = {x: /„(i,u)eA;) 

= [x: \nP x (f n (x,u)) + nH UJn(xM) (Y\U) + \n-\A\Hn + l)>0}, (14) 
then following eq. ([5|), Pf n can be expressed as 



Pfn= E P x(*) 



(15) 



Consider now the following embedder: 



fni x i u ) = argmiriy. de{ x,y)<nD e InPx(y) + nH uy (Y\U) 



(16) 



where ties are resolved in an arbitrary fashion. Then, it is clear by definition, that TJ(/*) C r£(/ n ) for 
any other competing /„ that satisfies the distortion constraint, and thus /* minimizes Pf n subject to the 
constraints. 



3 Discussion 

In this section, we pause to discuss a few important aspects of our basic results, as well as possible 
modifications that might be of theoretical and practical interest. 

3.1 Implementability of the Embedder ( |16j) 

The first impression might be that the minimization in (|16p is prohibitively complex as it appears to 
require an exhaustive search over the sphere {y : d e (x,y) < nD e }, whose complexity is exponential in 
n. A closer look, however, reveals that the situation is not that bad. Note that for a memoryless source 
Px, 



InPx(y) 



Hy{Y)+V{Py\\P X 



(17) 
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where Hy(Y) is the empirical entropy of y and T>{Py\\Px) is the divergence between the empirical 
distribution of y, Py, and the source Px- Moreover, if d e (-,-) is an additive distortion measure, i.e., 
d e {x,y) = ^2™ =1 d e (xi,yi), then d e (x,y)/n can be represented as the expected distortion with respect 
to the empirical distribution of x and y, Pxy- Thus, the minimization in (|16|) becomes equivalent 
to maximizing [Iuy{U;Y) + V{Py\\Px)\ subject to E X yd e (X,Y) < D e , where Iuy(U;Y) denotes the 
empirical mutual information induced by the joint empirical distribution Puy and Exy denotes the 
aforementioned expectation with respect to Pxy- Now, observe that for given x and u, both [Iuy(U; Y) + 
T>(Py\\Px)] and Exyd e (X, Y) < D e depend on y only via its conditional type class given (x, u), namely, 
the conditional empirical distribution Puxy(y\x,u). Once the optimal Puxy{y\x,u) has been found, 
it does not matter which vector y is chosen from the corresponding conditional type class T(y\x,u). 
Therefore, the optimization across n-vectors in p6|) boils down to optimization over empirical conditional 
distributions, and since the total number of empirical conditional distributions of n-vectors increases only 
polynomially with n, the search complexity reduces from exponential to polynomial as well. In practice, 
one may not perform such an exhaustive search over the discrete set of empirical distributions, but apply 
an optimization procedure in the continuous space of conditional distributions {P(y\x,u)} (and then 
approximate the solution by the closest feasible empirical distribution). At any rate, this optimization 
procedure is carried out in a space of fixed dimension, that does not grow with n. 

3.2 Universality in the Covertext Distribution 

Thus far we have assumed that the distribution Px is known. In practice, even if it is fine to assume a 
certain model class, like the model of a memoryless source, the assumption that the exact parameters of 
Px are known is rather questionable. Suppose then that Px is known to be memoryless but is otherwise 
unknown. How should we modify our results? First observe, that it would then make sense to insist on 
the constraint ((6]) for every memoryless source, to be on the safe side. In other words, eq. (|6|) would be 
replaced by 

maxVP x (y)<e- A ", (18) 
. — ■ 
yeA 

where the maximization over Px is across all memoryless sources with alphabet A. It is then easy to see 
that our earlier derivation goes through as before except that Px(y) should be replaced by maxp x Px{y) 
in all places (see also [5]). Since lnmaxp x Px{y) = ~ n Hy(Y), this means that the modified version 
of A* compares the empirical mutual information Iuy{U;Y) to the threshold An — |^4| ln(n + 1) (the 
divergence term now disappears). By the same token, and in light of the discussion in the previous 
paragraph, the modified version of the optimal embedder (jTSJ) maximizes Iuy(U;Y) subject to the 
distortion constraint. Both the embedding rule and the detection rule are then based on the idea of 
maximum mutual information, which is intuitively appealing. For more on this idea and its use as a 
universal decoding rule see [S7J Sec. 2.5]. 

3.3 Other Detector Statistics 

In the previous section, we focused on the class of detectors that base their decision on the empirical 
joint distribution of pairs of letters {(u,y)}. What about classes of detectors that base their decisions 
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on larger (and more refined) sets of statistics? It turns out that such extensions are possible as long as 
we are able to assess the cardinality of the corresponding conditional type class. For example, suppose 
that the stegotext is suspected to undergo a desynchronization attack that cyclically shifts the data by k 
positions, where k lies in some uncertainty region, say, {—K, —K+l, . . . , — 1, 0, 1, . . . , K}. Then, it would 
make sense to allow the detector depend on the joint distribution of 2K + 2 vectors: y, u, and all the 2K 
corresponding cyclic shifts of u. Our earlier analysis will carry over provided that the above definition of 
Huy(Y\U) would be replaced the conditional empirical entropy of y given u and all its cyclic shifts. This 
is different from the exhaustive search (ES) approach (see, e.g., [25]) to confront such desynchronization 
attacks. Note, however, that this works as long as K is fixed and does not grow with n. 

3.4 Random Watermarks 

Thus far, our model assumption was that x emerges from a probabilistic source Px, whereas the water- 
mark u is fixed, and hence can be thought of as being deterministic. Another possible setting assumes 
that u is random as well, in particular, being drawn from another source Pjj, independently of x, nor- 
mally, the binary symmetric source (BSS). This situation may arise, for example, when security is an 
issue and then the watermark is encrypted. In such a case, the randomness of u is induced by the 
randomness of the key. Here, the decision regions A and A c will be defined as subsets of A n x B n and 
the probabilities of errors Pf n and Pf p will be defined, of course, as the corresponding summations of 
products Px{x)Pjj{u). The fact that u is emitted from a memoryless source with a known distribution, 
makes this model weaker compared to the model treated above in which u is an individual sequence. 
Although this model is somewhat weaker, it can be analyzed for more general classes of detectors. This 
is because the role of the conditional type class T(y\u) would be replaced by the joint type class T(u, y), 
namely, the set of all pairs of sequences {(u' , y')} that have the same empirical distribution as (it, y) (as 
opposed to the conditional type class which is defined as the set of all such y's for a given u). Thus, the 
corresponding version of A* would be 

A* = {(u,y): lnP x (y)+lnP u (u) + nHuy(U,Y) + Xn-\A\ln(n + l)<o'j, (19) 

where Huy(U,Y) is the empirical joint entropy induced by (it,y), and the derivation of the optimal 
embedder is accordingly^ The advantage of this model, albeit somewhat weaker, is that it is easier to 
assess \T(u, y)| in more general situations than it is for |T(y|it)|. For example, if a; is a first order Markov 
source, rather than i.i.d., and one is then naturally interested in the statistics formed by the frequency 
counts of triples {ui = u, yt = y, y%-\ = y'}, then there is no known expression for the cardinality of 
the corresponding conditional type class, but it is still possible to assess the size of the joint type class 
in terms of the empirical first-order Markov entropy of the pairs {(ui,yi)}. Another example for the 
differences between random watermark and deterministic watermark can be seen in Section [HI 

It should be also pointed out that once u is assumed random (say, drawn from a BSS), it is possible 
to devise a decision rule that is asymptotically optimum for an individual covertext sequence, i.e., to drop 
the assumption that x emerges from a probabilistic source of a known model. The resulting decision 

1 Note that in the universal case (where both Px and Pjj are unknown), this leads again to the same empirical mutual 
information detector as before. 
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rule, obtained using a similar technique, accepts Hi whenever Huy{U\Y) < 1 — A, and the embedder 
minimizes Huy(U\Y) subject to the distortion constraint accordingly. 

4 Continuous Alphabet — the Gaussian Case 

In the previous sections, we considered, for convenience, the simple case where the components of both 
x and y take on values in a finite alphabet. It is more common and more natural, however, to model 
x and y as vectors in IR™. Beyond the fact that, summations should be replaced by integrals, in the 
analysis of the previous section, this requires, in general, an extension of the method of types |27j . used 
above, to vectors with real-valued components (see, e.g., [29] , [30] , [31] ) . In a nutshell, a conditional type 
class, in such a case, is the set of all y-vectors in IR" whose joint sufficient statistics with u have (within 
infinitesimally small tolerance) prescribed values, and to have a parallel analysis to that of the previous 
section, we have to be able to assess the exponential order of the volume of the conditional type class. 

Suppose that a; is a zero-mean Gaussian vector whose covariance matrix is cr 2 I, I being the n x n 
identity matrix, and a 2 is unknown (cf. Subsection 13. 2\i . Let us suppose also that the statistics to be 
employed by the detector are the energy of Yli=i Vi an d the correlation Yli=i u iVi- These assumptions are 
the same as in many theoretical papers in the literature of watermark detection. Then, the conditional 
empirical entropy Huy(Y\U) should be replaced by the empirical differential entropy h<uy(Y\U), given 
by [30]: 



huy(Y\U) = -In 

= lm 



2ire ■ min 





i=l 



I In 
2 



2-Ke 



2ne 



in ^i=l u iVi. 

- T n u 2 



1 

(- Y^2A) 2 
i=i / 



The justification of eq. ([2U]) is as follows: For a given e > define the set 

( n n n n 

T £ (y\u) = \yeR n :\^-^2yl\<ne,\Y^ W«< - E W 



i=l 



i=i 



Similarly as in Lemma 3 [3D], it can be shown that 



< ne 



(20) 



(21) 



lim lim — In 

e— *0 n— >oo ft 



Vol{T e (y\u)} =h U y(Y\U). 



(22) 



To see this, define an auxiliary channel y = (3u + z, where z ~ A/"(0, cr 2 l) (this channel is used only 
to evaluate Vol {T e (y|w)} and is not related to the actual distribution of y given u, see 30, p. 1262]). 
By tuning the parameters /3 and a 2 such that the expectations of — X)"=i vl anc ^ n S™=i Vi u i would 
be \ ~Y^l=iVi an d ^ Sr=i Vi u ii respectively, the set T e (y\u) has a high probability under the auxiliary 
channel given u. Moreover, any two vectors in T e (y\u) have conditional pdf's which are exponentially 
equivalent. Accordingly, using the same technique as in the proof of Lemma 3 in [30\ p. 1268] (which is 
based on these observation) we derive an upper and a lower bound on Vol{T e (y|w)}. These bounds are 
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identical in the logarithmic scale, and so, 

Vol{T 6 (i/|u)}=e n [ A «W( y l^+AW] ) (23) 

and lim e ^o A(e) = 0. 

Note that the order in which the limits are taken in ([2"2"|l is important: We first take the dimension 
n to infinity, and only then we take e to zero. Mathematically speaking, if e goes to zero for a finite 
dimension n the volume of T e (y\u) equals zero. The order of the limits has a practical meaning too. The 
fact that e is positive for any given dimension means that the detector can calculate the correlation and 
energy with limited precision. In the absence of such a realistic limitation, one can offer an embedding 
rule (under the attack-free case and for continuous alphabet) with zero false-negative and false-positive 
probabilities by designing an embedder with a range having measure zero 0. This additional limitation 
that we implicitly impose on the detector, is very natural and it exists in every practical system. 

Using the same technique used to evaluate huy(Y\U) in ([2"U1) , it can easily be shown that 



lim lim — In 

e— >0 n— »oo n 



\o\{T e (y)} = \ In ( 27re • \ £ yf ) ± hy(Y) , (24) 



n 

i=l 



where 



T e {y)= yeR":|$> 2 -£^|<ne . (25) 
t i-i i-i ) 

Therefore, the optimal embedder maximizes 



or, equivalently, □ maximizes 



0: 



V n Z_-/ 1 — 1 Vi 



A (u,y) 2 



subject to the distortion constraint, which in this case, will naturally be taken to be Euclidean, Y^i=i( x i~ 
Hi) 2 < nD e . While our discussion in Subsection l3.ll regarding optimization over conditional distributions, 
does not apply directly to the continuous case considered here, it can still be represented as optimization 
over a finite dimensional space whose dimension is fixed, independently of n. In fact, this fixed dimension 
is 2, as is implied by the next lemma. 

Lemma 1. The optimal embedding rule under the above setting has the following form: 

f*{x,u) =ax + bu. (28) 



2 E.g., the spread-transform dither modulation (STDM) embedder proposed in 1321 Sec. V.B] achieves zero false-negative 
probability under the attack-free scenario because the embedder range has measure zero. We thank M. Barni for drawing 
our attention to this fact. 

3 Note also that the corresponding detector, which compares Iuy(U;Y) to a threshold, is equivalent to a correlation 
detector, which compares the (absolute) correlation to a threshold that depends on the energy of y, rather than a fixed 
threshold (see, e.g., [28]). 
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Proof. Clearly, every y £ IR" can be represented as y = ax + bu + z, where a and b are real valued 
coefficients and z is orthogonal to both x and u (i.e., (u, z) = (x,z) = 0). Now, for any given y = 
ax + bu + z such that z ^ 0, the vector projected onto the subspace spanned by x and u, y = ax + bu, 
achieves a higher squared normalized correlation w.r.t. u than the vector y. To see this, consider the 
following chain of inequalities: 



R(u,y) = 



(u,y) 2 



M 2 



(it, ax + bu + z) 1 



(ax + bu + z,ax + bu 
(u, ax + bu) 2 



\\ax + bu\\ 2 + ||z|| 2 
< R(u,y). (29) 

In addition, if y fulfills the distortion constraint, then so does the projected vector y, i.e., 

\\y-x\\ 2 = || (a - l)x + bu + z\\ 2 

= \\(a-l)x + bu\\ 2 + \\z\\ 2 

> \\(a- l)x + bu\\ 2 

= lly-zll 2 - (30) 

Therefore, the optimal embedder must have the form y = ax + bu. In summary, given any y that 
satisfies the distortion constraint, by projecting y onto the subspace spanned by x and it, we improve 
the correlation without violating the distortion constraint. □ 

Upon manipulating this optimization problem, by taking advantage of its special structure, one can 
further reduce its dimensionality and transform it into a search over one parameter only (the details are 
in Subsection 14. ljl . 

Going back to the opening discussion in the Introduction, at first glance, this seems to be very close 
to the linear embedder ([T]) that is so customarily used (with one additional degree of freedom allowing 
also scaling of a;). A closer look, however, reveals that this is not quite the case because the optimal 
values of a and b depend here on x and u (via the joint statistics J^., x 2 and X^iLi u i x i) rather than 
being fixed. Therefore, this is not a linear embedder. 

4.1 Explicit Derivation of the Optimal Embedder 

In this subsection, we present a closed-form expression for the optimal embedder. As was shown in the 
previous section, the following optimization problem should be solved: 

, 2 1 



max 



IV" w 2 

subject to: ^^{Vi ~ x if — n ^e (31) 
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Substituting y = ax + bu in cq. (|3Tj) . gives: 



max 

a : bSR 



a 2 /? 2 + 2afy? + b z 



a 2 a 2 + 2a6p + b 2 
subject to: (a - l) 2 a 2 + 2(a - l)bp + b 2 < D 

where a 2 = ~ X)2=i x f an d P = « S™=i s** 11 *- Note that a 2 > p 2 by Cauchy-Schwarz inequality. 
Theorem 2. TTie optimal values of (a, 6) are: 
• IfD e >a 2 -p 2 : 

a* = ; 6* = p + ^p 2 -a 2 + D 



(32) 



(33) 



• IfD e < a 2 



a* — argmax |t(a) | a G {ai, a2, &3, 04} i?| 
b* = a*-t(a*) 



(34) 



where 



t(a) 
R 



(1 - a)p + sgn(p)y/D e - (a - l) 2 (a 2 - p 2 ) 



1 - 



a 2 — p 2 



■1 



a 2 — p 2 



(35) 
(36) 



and 



ai.2 



1± 



(a 2 - p 2 )(a 2 - D e ) ± VDpt^ja 2 - p 2 )(a 2 - De) 
a 2 (a 2 — p 2 ) 



D, 



(37) 



The proof is purely technical and therefore is deferred to the Appendix. We note that in the case 
where D e <C a 2 — p 2 , the value of a* tends to 1, and the value of b* tends to sgn(p)y/D e . Hence, the linear 
embedder is not optimal even in the case where D e <C a 2 . We will next use the above values to devise 
a lower bound on the exponential decay rate of the false-negative probability of the optimal embedder, 
and then compare it to an upper bound on the false negative exponent of the linear embedder. 

4.2 Lower Bound to the False Negative Error Exponent of the Optimal Em- 
bedder 

Since the calculation of the exact false-negative exponent of the optimal optimal embedder is highly 
non-trivial, in this subsection we derive a lower-bound on this exponent. Later, we show that even this 
lower bound is by far larger than the exponent of the false-negative probability of the additive embedder. 
Therefore, the additive embedder is sub-optimal in terms of the exponential decay rate of its false negative 
probability. 
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The lower bound will be obtained by exploring the performance of a sub-optimal embedder of the 
form y = x + sgn(p)\/D e u, which we name the sign embedder. This embedder is obtained by setting 
a = 1 in fnote that this value is in the allowable range R of a). We assume that X ~ 7V(0, er 2 /). 
First, we calculate a threshold value T which always guarantees a false-positive exponent not smaller 
than A. Using the proposed detector ([26]) . the false-positive probability can be expressed as 



Pf p = Pr{i U y(U;Y)>T\H } = Pr {p 2 uy > 1 - e~ 2T \ H } 

= 2 Pr {p uy > Vl~e~ 2T | Ho] 

where puy = ||M||.'||y|| ^ s ^ ne normalized correlation between u and y. Because under Hq Y = X, and 
because of the radial symmetry of the pdf of X, we can conclude that for large n O p. 295]: 

_ 2A n (9) ^_ n l n (sin0) 

where A n (8) @ is the surface area of the n-dimensional spherical cap cut from a unit sphere about the 



origin by a right circular cone of half angle 8 — arccos (Vl — e _2T ) (0 < 8 < tt/2). Since we required 
that Pf p < e~ nX , then h^sin^) must not exceed — A, which means that 

-A > ln(sin0) 

T > --In [1 - cos 2 (arcsin(e" A ))] = A , (38) 



where the last equality was obtained using the fact that cos ( arcsin(a;)) = \/l — x 2 . Hence, setting T = A 
ensures a false positive probability not greater than e _nA for large n. Define the false-negative exponent 
of the sign embedder 



A 



Ef n = lim lnP /n (39) 

J n — >oq 77, 

where the false- negative probability is given by 

Pfn = Pr {luy(U;Y) < A | ff x } = Pr {p 2 uy < 1 - e~ 2A | Hj.}. (40) 

Theorem 3. The false-negative exponent of the sign embedder is given by 

( o D - e Z 2X < a 2 

e%{\d,) = \ lF D ^ D ^ \ ^ ' - (4i) 



2 



(T 2( 1 _ e ~2A) 111 ( a 2({! r - Ml I 1 



e 



The proof, which is mainly technical, is deferred to the Appendix. Let us explore some of the 
properties of £^(A, D e ). First, it is clear that Ej^O, D e ) — oo (the detector output is constantly Hi) 
since p\ iy > 0. In addition, Ej^ rl (X, 0) = (y — x and therefore does not contain any information on u). 
For a given D e , Ef n (\ D e ) = for A > \ In (l + §?). 

The exact value of the optimal exponent achieved when the optimal embedder is employed is too 
involved to calculate. However, we can use some of the properties of the optimal embedder to improve 
the lower bound on the optimal exponent. According to Theorem [21 in the case where D e > a 2 — p 2 , 



'It is well-known [33] p. 293] that A n (0) = ( " ^TE) ~~ Jo sin( ™~ 2) {p)dV> and A n (n) = 2A„(tt/2). 
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the optimal embedder can completely "erase" the covertext and therefore achieves a zero false negative 
probability. We use this property to improve the performance by introducing sub-optimum embedder 
which outperforms the sign embedder. Since D e > a 2 > a 2 — p 2 , the following embedding rule is obtained: 
y = ax + bu where 



(a b) — I (0,P+vV-« 2 + A0 , D e >a 2 , . 

(0 ' 6) -\ (l,sgn(p)^) , else ' (42) 

This embedder, which is an improved version of the sign embedder (but still sub-optimal), erases the 
covertext in the cases where D e > a 2 (to keep the embedding rule a function of one parameter, we chose 
to "erase" the covertext only if D e > a 2 ). Its performance is presented in the following Corollary: 

Corollary 1. For A > 5 In 2, the false negative exponent of the improved sign embedder is given by: 

e(kd - ] -{*»-. .<f»-i] ; * ** ; (43) 

otherwise, the false-negative exponent equals to E^(X,D e ). 

The proof is deferred to the Appendix. The fact that the optimal embedder can offer a positive false- 
negative exponent for every value of A is not surprising due to its ability to erase the covertext, which 
leads to zero probability of false-negative. Although the improved sign embedder can offer a tighter lower 
bound, the improvement is made only in the case where D e > a 2 (though it is not known a priori to the 
embedder). Nevertheless, it emphasizes the true potential of the optimal embedder and the fact that the 
sign embedder is truly inferior to the optimal embedder. In Figure 2, the false negative exponent of the 
sign embedder and the false negative exponent of the improved embedder are plotted as functions of A 
for a given values of D e and a. The point where the two graphs break apart is A = \ ln(2). From this 
point on, the improved sign embedder achieves a fixed value of 0.5(D e /er 2 — \n(D e /a 2 ) — 1). 



Figure 2: Error exponents of the sign embedder and its improved version for a 2 = 1 and D e = 2. 
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4.3 Comparison to the Additive Embedder 



Our next goal is to calculate the exponent of the false-negative probability of the linear additive embedder 
y = x + ^/Ulu, where a normalized correlation detector is employed. Again, we first calculate a threshold 
value used by the detector which ensures a false-positive probability not greater than e~ nX . The false 
positive probability is given by 



Pf P = Pr {puy > T\H } = Pr > 



(44) 



where 9 = arccos(T) (0 < 9 < ir/2). The second equality is due to the fact that under H Y = X, and 
the third equality is again, due to the radial symmetry of the pdf of X. Then, ln(sin#) < —A implies: 



T cos 



arcsm 



in (e~ X )] = V 7 ! -e- 2A , 



(45) 



and therefore, letting T = y/1 — e _2A ensures a false-positive probability exponentially not greater than 



Note that A > implies that T must be non-negative. Define 



*i(r) 



A 



arccos 



fDl{T 2 -l)+T^r-D e (l-T*) 



and define the false-negative exponent of the additive embedder 



A 



1 



EZ = lim In P. 

J n^oo ft 



fni 



where the false-negative probability is given by 

Pfn = Pr{/5 W y < Vl-e-^H,} . 
Theorem 4. The false negative exponent of the additive embedder is given by 

Ef n {\, D e ) = min {P^A, D e ), P 2 (A, D e )} 

where, 



(46) 



(47) 



(48) 



(49) 



Pi (A, AO = 
P 2 (A,P e ) = 



mm 

D c e- 2x <r<- 



1 

-2A 2 



(l-e-^)o- 2 



^- ln (^)- 21nSin ^ l(r) )- 1 
1 \(l- e - 2 *)<r 2 j 



' l-e- 2A 

, else 



(50) 



Ef n {\D e ) < Ef n {\,D e ) for > a 2 and 



Let us examine some of the properties of P^(A, D e ). It is easy to see that P^(A, D e ) < P 2 (A, P e ) = 
Ej^(\,D e ), i.e., the upper bound on the additive embedder exponent serves as a lower bound on the 
optimal-embedder exponent. It is clear that P^j(A, 0) = since Pj^(A,0) < P|^(A,0) = 0. In contrast 
to the sign embedder, it turns out that P^j(0, D e ) < oo. To see why this is the case let us look at 



Pi(0,P e ) = min/(r) 

r>D a 



(51) 
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where f(r) = \ — In (^2) — 2 lnsin (\I/i(r)) — l] . Now, since /(r) is finite for r > D e , the minimum 
value of f(r) must be finite too. This is the case where the threshold value equals to zero and the 
probability that there is an embedded vector Y with negative correlation to u is not zero. Clearly, for a 
given D e , E^(X, D e ) = for A > | In (l + ^f-). Numerical calculations show that this happens even for 
smaller values of A, however, the exact smallest value of A for which EJ^X, D e ) = is hard to find. In 
Figures 3, 4 and 5 we compare the two embedding strategies by plotting their exponents as a functions 
of a 2 /D e . 
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Figure 3: Error exponents of the two embedding strategies {<J 2 /D = .1) 
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Figure 4: Error exponents of the two embedding strategies (tr 2 /D = 1) 
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Figure 5: Error exponents of the two embedding strategies (<r 2 / D = 10) 
4.4 Discussion 

When we take a closer look at the results, the fact the sign embeddcr achieves a better performance should 
not surprise us. Clearly, when the correlation between x and u is non-negative, the additive embedder 
and the sign embedder achieve the same performance. However, when the correlation between x and u 
is negative (this happens in probability 1/2 due to the radial symmetry of the pdf of the covertext) this 
is not true anymore. In this case, the additive embedder tries to maximize the correlation p between the 
covertext x and the watermark u (while the detector compares the normalized correlation py U between 
y and m to a given threshold), however, these efforts are turned to the wrong direction. Contrary to the 
additive embedding scheme, the sign embedder tries to maximize the absolute value of the correlation 
p while the detector compares the absolute value of the normalized correlation to a given threshold. In 
this case, the sign embedder tries to minimize the correlation p. This difference is best exemplified in the 
case where A = 0. In this case, the sign embedder achieves Ej^(0,D e ) = oo while E^(0,D e ) is finite 
since the probability of embedded vectors Y for which py U < is not zero. 

We note that although the sign embedder is suboptimal, it achieves a much better performance than 
the additive embedder with a slight increase in its complexity which is due to the calculation of sgn(p). 

5 Attacks 

Let us now extend the setup to include attacks. We first discuss attacks in general and then confine our 
attention to memoryless attacks. In Section 6, we will discuss general worst-case attacks. 

The case of attack is characterized by the fact that the input to the detector is no longer the vector 
y as before, but another vector, z = (z\, . . . , z n ), that is the output of a channel fed by y, which we shall 
denote by W n (z\y) as is shown in Fig. 1. For convenience, we will assume that the components of z take 
on values in the same alphabet A, which will be assumed again to be finite, as in Sections 2 and 3. Thus, 
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the operation of the attack, which in general may be stochastic, is thought of as a channel. Denoting 
the channel output marginal by Q(z) = J^y Px{v)W n (z\y) : the analysis of this case is, in principle, the 
same as before. 

Assuming, for example, that Q is memoryless (which is the case when both Px and W n are memoryless, 
i.e., W n (z\y) — n"=i ^( z i\Vi) f° r some discrete memoryless channel W : A — > A), then A* is as in 
Section 2, except that Px, Y, and y should be replaced by Q, Z and z, respectively. The optimal 
embedder then becomes 

fni x , u ) = argmin { y. de (x,y)<nD e } 

E Wn(z\y), (52) 
zeA% 

for the redefined version of AJ which is given by: 

A^ = {z:hiQ(z)+nHzu(Z\U) + n\-\A\lxi(n+l)>o\ (53) 
= {z:-nI zu (Z;U)-nV(P z \\Q)+nX-\A\ln(n+l)>o\ , (54) 



where Pz is the empirical distribution of z. Evidently, eq. (|52p is not a convenient formula to work with. 
Therefore, let us try to simplify (|52p . For a given y, let us rewrite l[52|) as follows: 

E w »Wv) E E ^«( z 'iy) 

T(2;|y,M)CA; z r eT(z\y,u) 

E |rOs|v,u)|W w (;e|y) . (55) 
T(2;|y,«)CAj 

It is easy to show that for a given z' 6 T(;z|y,ii) and a memoryless channel W n (z\y), the probability of 
.z' given y is given by the following expression: 

W n ( Z '\y) = e - n ["yz( Z \ Y )+T, aeA Py(a)v(Py Z (Z\Y=a)\\W(Z\Y=a))]^ ^ 

Using the fact that the cardinality of T(z\y,u) is given by 

\T{z\y,u)\ = e n "uyz^\y-u) > (57) 

we conclude that f*(x,u) £ T*(y\x,u), where T*(y\x,u) corresponds to the following conditional em- 
pirical distribution: 

p uxy{ Y \ X i U) = arg _ max I min I U yz{Z;U\Y) 



P U Xy(Y\X,U): [ P U y Z (Z\Y,U): 

E xy d e (x,Y)<D e i U z(Z;U)+v(P z \\Q)<\ 



+ E Py(a)V(P yz (Z\Y = a)\\W{Z\Y = a))] (58) 

aeA ) 

i.e., for a given u and x, we search for the empirical distribution Puxy{Y\X, U) which maximizes the 
exponent of the false negative probability dictated by the dominating conditional type T(z\y,u) in A£. 
Once the optimal empirical distribution Puxy(X\X, U) has been found, it does not matter which vector 
y is chosen from the corresponding conditional type T*(y\x, u). 



20 



6 General Attack Channel 



In this section we extend the results of the previous sections to include general attack channels subject 
to a distortion criterion. 

Consider a covertext sequence x = (x\,X2, ■■•,x n ) S X n emitted from a memoryless source Px as 
before. Let d a : y x Z — » H + denote another bounded single-letter distortion measure. An attacker 
subject to distortion level D a w.r.t. d a is a channel W n , fed by a stegotext y and which produces a 
forgery z such that 

n 

d a {y,z)=Y^d a {y h Zi)<nD a V{y,z)eAxA. (59) 

i=l 

We denote the set of attack channels which satisfy (l5§|) by W n (D a ). 

For a given m, we would like to devise a decision rule that partitions the space A n of sequences {z}, 
observed by the detector, into two complementary regions, A and A c , such that for z G A, we decide 
in favor of Hi (watermark u is present) and for z G A c , we decide in favor of Hq (watermark absent: 
y = x). Consider the Neyman-Pearson criterion of minimizing the worst-case false negative probability 

Pfn= max P /n (/„,A,W„) (60) 

where 



p/„(/„,A,wg = J2 



A 

zeA c 



E E Px(x))w n (z\y) 

yeA" \x:f n (x,u)=y 



(61) 



and Px{x) — n"=i Px(xi), subject to the following constraints: 

(1) The distortion between x and y does not exceed nD e . 

(2) The false positive probability is upper bounded by 

P fp = max P fp (A,W n ) < e~" A , (62) 
where A > is a prescribed constant and 

P /p (A,Wg = E ( E Px(y)W n (z\y)\ . (63) 
zga \ye^" / 

In other words, we would like to choose an embedder /„ and a decision region A so as to minimize Pf n 
subject to a distortion constraint (between the covertext and the stegotext) and the constraint that the 
exponential decay rate of Pf p would be at least as large as A, for any attack channel in W n (D a ). 

Similarly as in Section we focus on the class of detectors which base their decisions on the empirical 
joint distribution of z and u. 

6.1 Strongly Exchangeable Attack Channels 

First, we restrict the set of attack channels to be strongly exchangeable channels (the exact definition will 
be given in the sequel). Later, this restriction will be dropped, and the attack channel will be allowed to 
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be any member of W n (D a ). However, in this case random watermarks (rather than deterministic ones) 
must be considered. 

The use of strongly exchangeable channels in the context of general attack channels was proposed in 
[22], where Somekh-Baruch and Merhav showed (in another context) that the worst strongly exchangeable 
attack channel is as bad as the worst general attack channel, while strongly exchangeable channels are 
much easier to analyze. In the sequel, we will adjust the proof technique proposed in [35] to fit our needs. 

Definition 1. A strongly exchangeable channel W n is one that satisfies for all y G A n , z € A n 

W n (z'\y') = W n (z\y), V(y', z') e T(y, z) . 

Denote the set of all strongly exchangeable channels that operate on n-tuples by C^ x and let W!^{D a ) = 
W n {D a )nC™. 
Define 

W:(z\y) = ^^_l{d a (y,z)<nD a } , (64) 

where, c n (y) = [j2z:d a (.y,z)<nD a yrlkm] ' ^3 p ' 543 ]' Clearly ' W n e W n(D a )- Note that c n {y) 
equals to the reciprocal of the number of conditional types T(z\y) such that d a (y, z) < nD a [2"2"] p. 543] 
which implies that (n + l) - " 4 < c(y) < 1. Hence, c n (y) is at most polynomial in n. 
Define 

K = \z : Izu(Z;U)+ min V(P y \\P x ) > 1^1 ln (" + 1) + A (g5) 

[ Py:E yz d a (Y,Z)<D a - n J 

Lemma 2. (i) For every W n € W^{D a ), 

P fp (A*,W n )<e- n( - x - s ^ 

where linin^oo S n = 0. 
(ii) For any A C A n that satisfies 

Pf P (A, W n ) < e~ nX ' VW„ € W e n x {D a ) 
for some A' > A, then C A c for all sufficiently large n. 
Proof. Let T(z\u) C A. Then, we have 



-n A 



- Wn ££ (Dm) E ( E Px{v)w n {z\y) 

™ v ; zeA \yeA n 

^ E ( E Px{v)W*{z\y) 
zeA \yeA™ 

= E E f E Px{y)w:(z'\ y ) 

T(z\u)<zAZ'eT(z\u) \yeA n 

= E E QV). (66) 

T(Z\U)CA Z' GT(Z\U) 
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A 



where Q*(z) = J2 yeAn P x (y)W*(z\y). Now, 



Q*(z) = Px(y)W*(z\y) 
yeA n 

E E Px{y')w;My') 

T{y\z)cA" yer(y\z) 

= E E r x (y') Cn{y,) 

T(y\z)cA" y>eT(y\z) 



\T(z\y)\ 



l{d a (y',z)<nD a } 



J2 irCwWIe^l^^+^^Je-^W'^^^lKty,*) < nD a ] 

T(y\Z)GA™ 
T(y|Z)OA* 



exp < — n 



£T*(Z)+ min 

Py:Ey Z d a (Y,Z)<D a 



(67) 



where the last equality stems from the fact that c n (y) is polynomial in n. 
Clearly, for any z' <E T(z) the following holds 



Q(z) = ]T Px(y)W n (z\y) 
yeA™ 

= J2 Px^{y))WMz)\n{y)) 

= QW*)) 
= Q(z'), 



(68) 



where the second equality is because W n € W^ E (D a ) and ir(-) is a permutation of {1, ... ,n} such that 
z' = n(z). Hence Q*{z') = Q*(z) Vz' € T(z). Following flBBJ), we get 



]T \T(z\u)\Q*(z) 

T{Z\U)<ZA 

> \T(z\u)\Q*(z) 



> \T(z\u)\exp{ -n 

> exp < — n 



H Z (Z) + min V{P y \\P x ) 

Py:E yz d a (Y,Z)<D a 



H Z (Z) - nH zu (Z\U) + min V{P y \\P x ) 

Py:E yz d a (Y,Z)<D a 



(n + 1) 



-1-41 



exp < — n 



/zu(^;C)+ min V{P y \\P x ) 

Py:Ey Z d a (Y,Z)<D a 



(n + 1) 



(69) 



In the same spirit as in the attack-free scenario, we have shown that every T(z\u) in A is also in A*. 
Therefore, A % C A c and so the probability of A % is smaller than the probability of A c , i.e., A 1 minimizes 
Pf n among all A c corresponding to detectors that satisfy |62|) . It remains to show that A* itself has a 
false positive exponent which is at least as large as A for sufficiently large n. 
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Clearly, for any attack channel W n £ vV n ^{D a ), 

Ez>eT(z\y) W n(z'\v) 



W n {z\y) 



\T{z\y) 
W(T(z\y)\y) 



< 



\T(z\y 
1 



l{d a (y,z) < nD a } 



(70) 



\T{z\y)\ 

where the first equality is because W n (z'\y) — W n (z\y) W 6 T(z\y). Moreover, similarly as in (IH7|) 
combined with the fact that c(y) is polynomial in n implies that 



^ l{d a (y,z') < nD a } 
E 



ye 



exp < — n 



+ min ©(Py||Px) 

Py.Ey Z d a (Y,Z)<D a 



(71) 



Using (f70|) and (j7Tj) , it follows that A* indeed fulfills the false-positive constraint for any attack channel 
W n e W»(X>„): 



max P /p (A»,W n ) 



< 



max , E E p x(v)W n {z\y) 



E E E *Mv: 

t(z|u)ca» z'eT(z\u) \yeA n 



l{d a (y,z') <nP a } 
\T(z'\y)\ 



E E 

T(Z|W)CA, Z'eT(Z\U) 



exp { -n fli(Z) + min V(P y \\P x ) 

{ \ Py.Ey Z d a (Y,Z)<D a 



t(z|w)ca. 



exp { -n ffjs(2) + min V(P y \\P x ) 

{ V Py:Ey Z d a (Y,Z)<D a 



exp < - 

T(Z|W)CA, I 



■n min P(P y ||_P x ) 

Py:Ey Z d a (Y,Z)<D a 



where 5 n 



|-A|ln(n+1) 



< (n + l) lAl e- nX 

j_ - n (\-S„) 

as n — > oo. 



(72) 



□ 



Our next step is to find an embedder which minimizes the probability of false negative under the 
given decision region for any attack channels W n £ W^ x (D a ). Following Section the optimal embedder 
can be written as follows: 



f*(x,u) = are min max > W n (z\y) 



(73) 



zea^ 



Lemma 3. For any attack channel W n € W^ x (D a ), the optimal embedder f* which minimizes the 
false-negative probability can be expressed in the following manner: 



f*(x,u) = y, yeT*(y\x,u) 



(74) 
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where T*(y\x,u) corresponds to the following conditional empirical distribution: 

Puxy(Y\X,U) = arg max \ min I U yz{Z;U\Y)\ . 

Puxy(Y\x,u)-. { P U yz(z\Y.,u)-. ' J 

E xy d B {X,Y)<D e i uz (Z;U)+xmnp y( Yy.6yd a iy,z)<D a D(P y \\P x )<\ 



(75) 

Proof. For a given y G .4™, 

max V" W n (z\y) = max V" V" W„(z|y) 

w„eW">(r> ) . w n ew%°(D a ) ^— ' ^— ' 

; 2GAj " v ' T(Z\y,U)CAc Z'£T(Z\y,U) 

^ E E inzi^i^iRCy^o^^a} 

T(2|y,tt)CA; 2'eT(z|y,it) 
< £ |T(z|y, u)| • |T(^ly)|- 1 l{d a (y, z') < n£> Q } 

T(Z\y,U)QAl 

= max e -«Ai W z<W). (76) 

T(Z\y, U)CA% 

Therefore f*(x,u) E T*(y\x,u), where T*(y\x,u) corresponds to the conditional empirical distribution 
IB- □ 



Note that the optimal embedder and the optimal decision rule correspond to the case where the 
detector and the embedder are tuned to the worst possible channel W*. To extend the above results 
to general attack channels (i.e., channels that are members of W n (D a ) rather than Wf^{D a )) we must 
consider the random watermark setting (cf. Subsection 13 .41) . The reason for this will be made clear in 
the sequel. 

6.2 Random Watermarks and General Attack Channels 

In the spirit of Subsection 13. 4[ from this point on, we will use the model in which u is random as well, 
in particular, being drawn from another source Pjj, independently of x, normally, the binary symmetric 
source (BSS). In this case, the decision regions A and A c will be defined as subsets of A n x B n and the 
probabilities of error Pf n and Pf p will be defined, again, as the corresponding summations of products 
P x (x)Pu(u). 

The corresponding version of A*, proposed for strongly exchangeable attack, channels would be: 
A*, = \{z,u): I zu (Z;U)+V(P u \\Pu)+ min T>(P y \\P x ) > l-^l ln ( n + 1 ) + A (77) 

{ Py:Ey Z dJY,Z)<D a Tl J 

Theorem 5. (i) For every W n G W n {D a ), 

P fp (A^,W n )<e- n ^- 5 ^ , 

where limn—^ 8 n = 0. 
(ii) For any A C A n x B n that satisfies 

P/p(A, W n ) < e~ nX ' VW„ G W n {D a ) 
for some X' > X, then A^ C A c for all sufficiently large n. 
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To prove the above theorem in the case of general attack channels, we first need to ensure that the 
probability of false positive under A** will be smaller than e~" A for any attack channel inW n (D a ). We 
use an argument, which was used in [22, Lemma 4], to prove that the worst strongly exchangeable attack 
channel is as bad as the worst general channel, and therefore we can reuse the results of Lemma [H For 
the sake of completeness, we will rephrase the argument and adjust it to our problem. 

Proof. Given a general attack channel W n £ W n (D a ), let tt denote a permutation of {1, . . . ,n} and let 
W:\z\y) = W n (n(z)\*(y)). Clearly, 



w n (z\y) = ±Y, w n( z \y) 



is a strongly exchangeable channel. For a given W n £ W n (D a ), let the false-positive probability under A 
be 



P fp (A,W n ) = $>^ u ) E E W(*|y)| , 
u zeA(u) \yeA n 



(78) 



where A(u) = {z : (z,u) £ A}. Recall that any decision region A is a union of joint type classes 
{T(u,z)}. Since Pf p (A,W n ) is affine in W n , we can see that 



p fP (A,w:) = 



rr u zea(u) \ y J 

y \ ' * 



= E^n E 



Now, for a given permutation 7r, 



(79) 



p, p (a,wz) = E p ^) E \Y, p x(y) w M*)Wy)) 

u zeA(u) \ y 

= "£pu(<k(u)) Yl \Y, p x(<y)) w M z )\<y)) 

u zea(tt(u)) \ y 

= E p ^h E \Y. p ^y) w ^ z \y) 

u zeA(u) \ y 

= P fp (A,W n ) 



(80) 



where the second equality follows since z £ A(u) tt(z) £ A(tt(u)) (and that is because A is a union of 
joint type classes {T(u, z)}) and the third equality follows from the fact that Px and Pjj are memoryless 
which implies that P x {^{y)) = Px(y) and Pv(n{vj) = Puiy)- 
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From ([79j) and ([80]). we get that for any A, 



Therefore, for any A, 



i^P /p (A,W„) 

P /P (A,W„). (81) 



max P fp (W n ,A)= max PfJWn-A) . (82) 
W n £W n (D a ) J ^ iv„ew=-(D a ) J ^ 



Hence, the worst general attack channel is not worse than the worst strongly exchangeable channel, and 
therefore we can confine our search to the set of strongly exchangeable channels under which A** , defined 
in (|77|) . is optimal. Using a similar proof of Lemma it is easy to show that indeed under A** the 
false-positive probability is not greater than exp { — n(X — <5 n )}, where linin^oo 5 n = 0. □ 

Note that the summation over u (and the fact that any A is a union of types) enabled us the use of this 
argument, which might suggest that for a deterministic watermark, a general attack channel is worse than 
the worst strongly exchangeable channel. However, this channel might be dependent on the watermark 
sequence which is not available to the attacker. This is exactly the reason why random watermark setting 
is considered in the general attack scenario. 

Once again, it is easy to verify that A,* does not violate the false-positive probability constraint under 
general attack channel while minimizing the false-negative probability. 

We now proceed to find the optimal embedder. The false-negative probability for a given attack 
channel W n , embedder /„, and decision region A can be written as follow 

Pfn{fn,KW n ) = E Pu(u)P fn (f n ,A(u),W n ), (83) 

where 

P fn (f n ,A(u),W n ) = EE E Px(x)\w n (z\y). (84) 

zeA"(u) yeA~ \x:f n (x,u)=y J 

Corollary 2. For any attack channel W n G W n {D a ), the optimal embedder /** which minimizes the 
false-negative probability is the embedder defined in J74| j- 

Proof. Clearly, for any u 6 B n 

min max P/ n (/n, A**(-u), W n ) > min max Pf n (fn, A**(u), W n ) 

f n {X,U): W n eW„(D a ) f n (X,U): W n eW^(D a ) 

d e {X,y)<nD c de(X,y)<nD c 

max P /n (/*,A.,(u), W„) , (85) 

but on the other hand 

min max P/,,(/„,A„,l¥„) < max P/ n (/*, A,*, W„) 

d^{X.y)<nD a 

max P fn {f*,A^,W n ) , (86) 
w n ew^{D a ) 
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where the last equality can easily be obtained from the above argument when applied to embedders 
which use a certain conditional type T(y\x,u) to produce the stegotext (as /*). Therefore, the optimal 
embedder in the case of a general attack channel is /*, proposed in Theorem [3] □ 

Note that from (|74l) . ([77]) the false-negative error exponent can be expressed in a closed form using 
the method of types [57] . 

6.3 Discussion 

In this section, we extended the basic setup presented in Section [5] to the case of general attack channels. 
First, we solved the problem for the case where the watermark sequence is deterministic under strongly 
exchangeable channels. Then, we treated the case of general attack channels, but, we had to assume 
that the watermark sequence it is random too. However, this should not surprise us. Clearly, for a given 
watermark, the worst attack channel is dependent on the watermark (although it is not known to the 
attacker). In this case, the attacker can imitate the detector operation: first, it decides which hypothesis 
is more likely (using a similar decision rule used by the detector). Then, it can try to "push" the stegotext 
in the wrong direction causing a false detection. A similar behavior can be seen in the case of a random 
watermark message it and a deterministic covertext sequence x. If d e = d a and D a > D e , the worst 
channel (which does depend on the covertext x) is the following: if y ^ x (hypothesis Hi) then z = x, 
i.e., the channel completely erases the message, otherwise (hypothesis Hq) the channel tries to "push" y 
to A. In this case, both the false-negative probability and the false-positive probability might converge 
to one. The reason for that is rooted in the fact that the set of attack channels has not been limited. In 
Subsection 16. 1[ we restricted the class of attack channels to be a strongly exchangeable channel and got 
non-trivial results. Other limitations may be imposed on the attack channels (e.g., blockwise memoryless, 
finite-state channels) if meaningful results ought to be obtained. 

Note that the worst attack strategy W* is independent of A, the covertext distribution Px, and 
even the embedder strategy and its distortion level D e (assuming that the embedder use a certain type 
T(y\x,u) to produce the stegotext). The attack strategy is only dependent on the allowable distortion 
level D a . Therefore, the embedding strategy can be designed assuming that the worst attack channel 
is present. This can be useful in evaluating the performance (in terms of false-negative probability) of 
suboptimal embedders. 

Appendix 

Proof of Theorem^ First, we explore the case where a = 0, i.e., y = bu. Substituting a = in the 
constraint of eq. (|32[1 . we get that b 2 — 2pb+ (a 2 — D e ) < 0. The fact that b is a real number implies that 
the discriminant of (6 2 — 2pb+ (a 2 — D e )) is non- negative which leads to p 2 — (a 2 — D e ) > 0, or D > a 2 — p 2 . 
This corresponds to the case where the stegotext includes only a fraction of it without violating the 
distortion constraint. In this case, the false-negative probability is zero (the distortion constraint is so 
loose, it allows to "erase" the covertext). In the following case, we can choose b* = p + \J p 2 — a 2 + D 
as the optimal solution. From now on, we assume that D e < a 2 — p 2 which means that a = is not a 
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legitimate solution. Let us assume that p > 0. Define t = b/a, and rewrite ([521) by dividing the numerator 
and denominator by a 2 : 

max fit) 

subject to: a 2 t 2 + 2 (a - I) apt + (a - l) 2 a 2 < D (A-l) 

where 



/(*) 



(t + p) 2 + {a 2 - p 2 ) ' 

It is easy to show that maximizing f(t) is equivalent to maximizing t. Since t is a real number, the 
discriminant of [a 2 t 2 + 2 (a — l)api + (a — l) 2 a 2 — £>] must be non-negative, i.e., 

A = 4a 2 [D-(a- l) 2 (a 2 - p 2 )} > , (A-2) 

which leads to 



11 <«<1 + JtA«- ( A -3) 



a 2 — p 2 y a 2 — p 2 



Hence, a must be in the range i? = 



-De 1 
75 3i 1 



Let us rewrite the constraint as follows, 



[at + (a - l)p] 2 + (a- I) 2 (a 2 - p 2 ) - D < , (A-4) 

consequently, 



(l-a)p-^D e -(a~l) 2 (a 2 -p 2 ) < ( < (1 - a)p + y/D e - (a - l) 2 (a 2 - p 2 ) _ 
a a 

Our next step will be to maximize the upper bound on t in the allowable range of a. 

argmax t(a) (A-6) 

a£LR 

where 



t(a) = (l^W^EEEWEZ) . (A _ 7) 

a 



After differentiating with respect to a and equating to zero, we get 



_ (a 2 - p 2 )(a 2 - D e ) ± VWV(a 2 - P 2 )(« 2 - £>«) 
ai > 2 a 2 (a 2 -p 2 ) ■ (A " 



Accordingly, the optimal value of a and b are 

(a*,b*) = fargmax jt(a)|a e {ai, a 2 , a 3 , 04} Q i?| , a* • t(a*fj , (A-9) 



where 03,4 = 1 ± y a2 ^f p2 . The same results are obtained in the case where p < 0. □ 
Proof of Theorem [H It is easy to show that under H\ 

Ply = M±^l , (A-10) 



29 



where a 2 and p are functions of the random vector X. By conditioning on a 2 , we can express the 
false-negative probability as 



P 



Pr {ply < 1 - e- 2A 



r\ -p a 2(r)dr, 



(A-ll) 



where (na 2 / a 2 ) is x 2 distributed with n degrees of freedom and the probability density function for the 
\ 2 distribution with n degrees of freedom is given by 



Pxl ( z ) 



(1/2)"/ 2 
T(n/2) ; 



^/2-l -n/2 



Z > 



and r(-) denotes the Gamma function. Now, given a , D e and a threshold value r = 1 — e , let us find 
the range of p for which p^y < r, i.e., 



A 



^) 2 



+ v^Dl) 2 + (a 2 p 2 ) 



< T 



(A-12) 



The function p 2 1 y{p) is symmetric with respect to the p axis, monotonically increasing in \p\ and attains 
its minimum value D D + a n at p = 0. Hence, for a 2 < D "^~ T "> ; is greater than r. After solving 
(|A-12j) with respect to p and using the fact that r < 1, we get that \puy\ < \pr implies that \p\ < 
y/D2(r - 1) + \/D e r 2 + to 2 - tD as long as a 2 > Ssikill, Define 



A 



9(r) = arccos 



rW e {T - 1) + VD e r 2 +tt-tD 



(A-13) 



It follows that 



Puy 



< T 



H 1 ,a 2 =r} = Prjp 2 < t/D^(t - 1) + \J D p t 2 + ra 2 - tD 
= 1 - Pr { p 2 > 



E x , a 2 



r D e (T - 1) + yjD e r 2 + ra 2 - tD 



H x ,a 2 = r 



1 2 M Q (r)) 



where 



A n (®(r)) „lnsin(e(r)) 

A n (n) 



We note that Pr ^Puy < t\Hi, a 2 | = for a 2 in the range 



0, 



geO-r) 



Therefore. 



P 



(n) (1/2)"/ 2 



r(n/2) j BeiLzi) 
(1/2) £ 



1-e' 



In sin (e(r)) 



e 2o 



n r 

7^2 



T(n/2) 



D e (l-r) r 



g nlnsine(r) e -^- e f ln(r/cr 2 )d r 



(A-14) 



Our next step is to evaluate the exponential decay rate of (|A-14j) . It is easy to see that the first integral 
of (j A-14|) has a slower exponential decay rate and therefore dictates the overall decay rate. To evaluate 
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the exponential decay rate of Pj^ as n — > oo we use Laplace's method for integrals^. Therefore, we need 
to find the slowest exponential decay rate of the integrant in the limits of the integral. It is easy to show 
that 



A 



lim — In 

n— »oo Tl 



(1/2)3 n ^ 



and therefore the overall exponent is given by 



EUr,D e ) 



1 

mm — 

r> £>e(l-T) 2 



ln(r/cr 2 ) - 1 



(A-15) 



(A-16) 



The function g{r) = [r/cr 2 — ln(r/cr 2 ) — l], r G (0,oo) achieves its minimum sd, r — a 2 and g(o~ 2 ) = 0. 
Therefore, in the case where Pe(1 ~ r) < er 2 , Ef n {T,D e ) = 0. Other wise, the minimum of (|A-16p is 



obtained at 



_ De(l-T) 



Hence, the false-negative exponent of the sign embedder is given by 







I 2 

Setting t = 1 - e~ 2X achieves (gTJ 



^e(l-r) 
7^3 



hi 



Ce(l-T) 

T 

else 



(A-17) 



□ 



Proof of Corollary [7J Since the false-negative probability of the improved embedder (jl2")l is zero for a 2 < 
D e we can rewrite the integral (|A-14[) for the case where — — < 1 (or A > 1/2 In 2) where the lower limit 
equals to D e (and does not depend on A) as following: 



P 



(n) 



(1/2)"/ 2 f°° 



D, 



n In sin ( 



m) 



e 2 " A 



Z) dr 



Sn T(n/2) 

Optimizing using Laplace method as done in the proof of Theorem [3] leads to 
Proof of Theorem^ Given A > 0, the false-negative probability is given by 



{puy < Vl-e- 2A |ffi} 



Pfn = rv 

where the normalized correlation, under Hi, is given by 



Puy 



Va 2 + ijDlp + D 



< T . 



(A-18) 
□ 

(A-19) 
(A-20) 



The function puy(p) achieves its minimum at p — — . Since p G [—a, a] we conclude that in the case 
where a 2 > D e , p U y < T implies that p < \/TT e (T 2 - 1) + T \J a 2 - D(l - T 2 ) (p U y(p) is monotonically 
increasing in p, and puy{—ot) = —1). If (1 — T 2 )D < a 2 < D e , puy < T implies that 



l D e {T 2 - 1) - T^a 2 - D(l-T 2 ) <p< y/D e (T 2 - 1) + T\J a 2 — D(l — T 2 ) 



Laplace's method is a general technique for obtaining the asymptotic behavior of integrals of the form I(x) 



In this case c £ [a,b], the maximum of $ (t ) in the interval [a, b] , dictates the asymptotic 



behavior of the integral (assuming that /(c) 7^ 0), or in the above case: 



lim In 

n — ^oo YL 



r m 

J a 



min $>(t) 

te\a,b] 



See 1341 Sec. 6.4], 1351 Ch.4] for more information. 
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Otherwise, for a 2 < (1 — T 2 )D e , p U y > T for all p S [—a, a]. Define 



A 



\&i(r) = arccos 



flT e {T 2 - f ) + T^Jr - D{l-T 2 ) 



A 



^(r - ) = arccos 



^(T 2 - f ) - T^Jr-D{\ - T 2 ) 



(A-21) 
(A-22) 



We need to pay attention to the point tq = - T2 — - in which v E'i(ro) = n/2. Beyond that point 
(r > ro), the probability of false- negative given a 2 = r goes to one as n tends to infinity. Therefore, the 
false-negative probability can be written as follows: In the case where 1 ~X > 1 (or A < | ln(2)) 



P 



(n) _ (l/2) 2 n" 2 ' 
r(n/2) 



fn 



f_ / e nlnsin (*i(r)) _ g n In sin (* 2 (r)) \ g-^^ lnW^)^ 



D e (l-T 2 ) r 



V - ^. e «lnsin (*!(,•)) g-^gflnCr/^)^ (A-23) 

r 

^ ( 1 _ e nln S in(* 1 (r))\ -^^r/^)^ 



D e {i — T z ) r 
72 



The first integral in (|A-23p represents the false- negative probability when both \I/i(r) and are 
greater than 7r/2. In this case, we need to subtract the areas of two caps, i.e., An ^~^ l i(0)~^M 7r ~' I '2(>)) _ 
The second integral in (| A-23|) stems from the fact that for r > D e the false-negative probability (given 
a 2 = r) equals to ^^a^)^ • The ^ as ^ integral in (|A-23|) stems from the fact that the false-negative 
probability (given a 2 = r) equals to 1 — A ^(})^ ■ ^ n a similar way, in the case where 1 ~X < 1 (or 
A>|ln(2)) 



3 (n) = (l/2)gn2 
fn ~~ r(n/2) 



T — ^ e nlnsm(*i(r)) _ e nlnsin (* 2 (r)) j g-^gf hfr/ff 2 )^ 
D c (l-T 2 ) r 



2 / 

17 /-^ _ In sin (*i(r)J _ g n In sin (* 2 (r)) 



I3 e (l-T 2 ) T* 
72 



e-i^efWr/o- ) df ( A _ 24 ) 
e" lnsin (* 1 e-^et Mr/O dr 



Since we are interested in the exponential decay rate (to the first order), the slowest exponent dictates 
the overall exponential behavior. Therefore, the fact that sin (^i(r)) > sin (^(O) f° r D e (l — T 2 ) < r < 
D(l - T 2 )/T 2 implies that 



Pfn = 



-De(l-T Z ) „ 

72 ^2 



^_ g nln sin (*!(,-)) e -^ e ^.l„(r/ ( T 2 ) dr 
D„(l-T 2 ) r 

2 



/■oo 2 
+ / ^_ e -^& e fln(r/ CT 2 ) dr 

/ P e (l-T 2 ) 7" 
72 

Again, using the Laplace's method for integrals [33 Ch.4] we can conclude that 

Ef n {T,D e ) =uxm[E 1 {T,D e ),E 2 {T,D e )^ , 



(A-25) 



(A-26) 
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where, 



Ei(T, D e ) 

E 2 (T,D e ) 
E 2 {T,D e ) is given by 
E 2 (T,D e ) 



mm - 

O e (l-P)<r< D '"; T2 » 2 



mm - 

Oe(l-T 2 ) 2 



^-ln(i)-21nsm(<t,(r))-l 



(A-27) 
(A-28) 



£>e(l-T 2 ) 
T 2 (T 2 



In 



( D.(l-T 2 : 



_D e (l~T 2 ) 

T 2 



< a 2 



, else 



(A-29) 



Since T 2 = 1 - e - 2 \ then E 2 {\,D e ) = Ef n {\,D e ) and therefore Ef n (\,D e ) < Ef n (\,D e ). Our next 
step will be to prove that Ei(T, D e ) < E 2 (T, D e ) when d ' {1 t ~ t2) > a 2 (otherwise, Ef n {T,D e ) = 0). 
Define 



/W = ^-^n(^)-lnsm(* l( r))-i 
f(r) is a continuous, non- negative function in the range D e (l — T A ) < r < " l T2 — '-. Clearly, 

Ei(T,D e ) < f ( Je(1 r ; T2) ) = E 2 (T,D e ). 
In addition, f'(r) is continuous in the above range. It can easily be shown that 



,( p e {i-T 2 )\ _ 1 

y2 



2^.2 



1 - 



T 2 a 



D e (l-T 2 ) 



> 



(A-30) 



(A-31) 



(A-32) 



hence, f(r) is monotonically increasing in small neighborhood of T2 — , and therefore Ei(T,D e ) < 
E 2 (T, D e ). This fact leads to the conclusion that Ef n {\, D e ) < Ef n {\, D e ). The exact value of Ei (T, D e ) 
is cumbersome and therefore will not be presented. □ 
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